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(57) Abstract 

A region-based system, method and architecture for encoding and decoding digital still images to produce a scalable, content-based, 
randomly accessible compressed bit stream is disclosed. According to the system, raw image data is decomposed and ordered into a 
hierarchy of multi-resolution sub-images. Regions of interest are then detennined. A region mask is defined to identify the regions of 
interest and then encoded. This data is then sorted on the basis of the magnitude of the multi-resolution coefficients to produce the scalable, 
content-based randomly accessible compressed bit stream. 
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REGION-BASED SCALABLE IMAGE CODING • 

Field of the Invention 

The present invention relates generally to image coding, and more particularly to compression 
and decompression of scalable and content-based, raindomly accessible digital still images. 

Background of the Invention 

The fast growth of Internet and digital multimedia applications has created a consistent and 
growing demand for new image coding tools that reduce the usually large and cumbersome raw 
image data files into a compressed form. Compactness of the resulting bit-stream, however, is no 
Jonger the only requirement asked of developers when devising new coding tools. End users and 
their applications are increasingly demanding features like scalability, error robustness, and 
content-based accessibility. 

Photographs or motion picture film are two-dimensional representations of three-dimensional 
objects viewed by the human eye. These methods of recording two-dimensional versions are 
"continuous** or *' analog" reproductions. Digital imageis are discontinuous approximations of thes^ 
analog images made up of a series of adjacent dots or picture elements (pixels) of varying color 
or intensity. On a computer or television monitor, the digital image is presented by pixels 
projected onto a glass screen and viewed by the operator. The number of pixels dedicated to the 
portrayal of a particular image is caUed its resolution i.e. the more pixels used to portray a given 
object, the higher its resolution, 

A monotone image — black and white images are called ''grayscale" — of moderate resolution 
might consist of 640 pixels per horizontal line. A typical image would include 480 horizontal 
rows or lines with each of these containing 640 pixels per line. Therefore, a total of 307,200 
pixels are displayed in a single 640 x 480 pbcets image. If each pixel of the monotone 
image requires one byte ofdata to describe it fr.e. either black or whJteX a total of 307,200 
bytes are required to describe just one black and white image. Modern gray scale images use 
different levels of intensity to portray darkness and thus use eight bits or 256 levels of gray. 
The resulting image files are therefor correspondingly larger. 

For color images, the color of each pixel in an image is typically detentiined by three variables: 
red (R), green (G), and blue (B). Ety mixing these three variables in different proportions, a 
computer can display different colors of the spectrum. The more variety available to represent 
each of the three colors, the more colors can be displayed. In order to represent, for example, 
256 shades of red, an 8-bit number is needed. The range of the yalues of such a color is thus 

1 
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0-255. The total number of bits needed to represent a pixel is therefor 24 bits — 8 bits each for 
red, green, and blue, commonly known as RGB888 format. Thus, a ^ven RGB picture has 
three planes, the red, the green, and the blue, and the range of the colors for each pixel in the 
picture is 0 - 16.78 million, or R x G x B = 256 x 256 x 256. A standard color image of 640 x 
480 pbcels therefor, requires approximately 7.4 megabits of data to be stored or represented in 
a computer system. This number is arrived at by multiplying the horizontal and vertical 
resolution by the number of required bits to represent the full color range — 640 x 480 x 24 = 
7,372,800 bits. 

Standard, commonly available hardware, while increasingly fast and affordable, still finds files 
of this size slow and unv^eldy. This is especially true in the case of interactive applications and 
Internet use. Interactive applications demand very fast multi-directional processing of multi- 
media data. Given their persistently large size, image files have been a rate limiting fector in 
the development of realistic, interactive computer applications. In the case of the Internet, 
end-users and applications are further limited by the slow pace of modems and other 
transmission media. For example, the amount of information currently capable of being 
transmitted over a telephone line in the interval of one second is restricted to 33,600 bits-per 
second due to the actual wires and switching functions used by the typical telephone company. 
Therefore^ a single, full color RGB888 640x480 pixel page, with its 7,372,800 bits of data 
would take approximately three and one half minutes to transfer at this baud rate. 

Many methods of compressing image data exist and are well known to those skiDed in the art. 
Some of these methods are as "lossless" compression; that is, upon decoding and 
decompressing they restore the original data without any toss or elimination of data. Because 
their relative reduction ratios are small however, these lossless techniques cannot satisfy all the 
current demands for iinage compression technologies Other compression methods exist that 
are nonreversible and khown as "lossy". These rionreVersible methods can offer considerable 
compression^ but do result in a loss of data/ In image files, the high compression rates are 
actually achieved by elmiiiating certain aspects of tihe iniage, usually those to which the human 
eye has limited or no sensitivity. After codings an inverse process performed on the reduced 
data set to decompress and restore a reasonable fei^ of the priginat image. Lossy 
compression techniques mdy also be c^ for a variable mix of data 

compression and image fidelity. 

Compactness of a compressed bit-stream is usually measured by the size of the streanv in 
comparison to the size of the corresponding uncompressed image data. A quantitative measure 
of the compactness is the compression ratio, or i^terhatively, the bit-rate where: 

compression ratio = (total bytes of the brigihal raw image data) / (total bytes required for 
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compressed image) 
and 

bit-rate = (total bytes required for decompression) / (pixel number of the original image) 

In general, the higher the compression ratio (or the lower the bit-rate), the higher the 
compactness of a compressed bit-stream. Compactness has been always a primary concern for 
all data compression techniques. 

One of the most popular formats for compressed image files is the GIF format. GIF stands for 
"Graphic Image Format"/ and was developed by Compuserve to provide a means of passing an 
image from one dial-up customer to another, even across diflFerent computer hardware 
platforms. It is a relatively old format, and was designed to handle a palette of 256 colors — 8 
bit as opposed to 24 bit color. When developed, this was near state of the art for most 
personal computers. 

The "GIF" format uses an 8 bit Color Look Up Table (sometimes called a CLUT) to identify 
color values. If the original image is an 8 bit, gray-scale photo, then the "GIF" format 
produces a compressed lossless image file. A gray scale image typically has only 256 levels of 
gray. The operative compression is accomplished by the "Run Length Encoding" (RLE) 
mechanism of compressing the information while sa\dng a GIF file. If the original file were a 
24 bit color graphic image, then it would first be mapped to an 8 bit CLUT, and then 
compressed using RLE. The loss would be in the remapping of the original 24 bit ( 1 6.7 
million) dolors to the limited 8 bit (256 colors) CLUT. RLE encoding would reproduce an 
uncompressed image that was identical to the remapped 8 bit image, but not the same as the 
original 24 bit image, RLE is not an efiBcient way of compressing an image when there are 
many changes m the eoldratibn across a Hne of pixels. It is very efiScient when there are rows 
of pixels with the sarnie color or whOT^^ 

The other de facto stand^d of stUl image formats is the JPEG format. JPEG stands for Joint 
Photographic Experts Group. JPEG uses a lossy compression method to create the final file. 
JPEG files can be further compressed than their GIF relations, and they can maintain more 
color depth than the 8 bit table used in the GIF format. Most JPEG compression software 
provides the user with a choice between image quality, and the amount of compression. At 
compression ratios of 10: 1 most images look very much like the original, and maintain 
excellent iiill color rendition. If pressed to 1 00: 1 the images tend to contain blocky image 
artifacts that substantially reduce quality. Unlike GIF, JPEG does not use RLE alone to 
compress the image, it uses a progressive set of tools to achieve the final file. 



wo 00/04721 



PCT/CA99/00641 



JPEG first changes the image from its original color space to a normalized color space (a lossy 
process) based on the luminance and chrominance of the image. Luminance corresponds to the 
brightness information while chrominance corresponds to hue information. Testing has 
indicated that the human eye is more sensitive to changes in brightness than changes in color 
or hue. The data is reordered in 8 x 8 pbcel blocks using the Discrete Cosine Transform 
(DCT), and this too produces some image loss. It effectively re-samples the image in these 
discrete areas, and then uses a more standard RLE encoding (as well as other encoding 
schemes) to produce the final file. The higher the ratio of encoding, the more image loss, and 
the 8 X 8 pixel artifacts become more noticeable. 

One of the requirements of evolving technologies is that they possess the 
charactOTStic/attribute of scalability. Scalability measures the extent to which a compressed 
bit-stream is capable of being partially decoded and utilized at the terminal end of the 
transmission. In meeting this need of progressive processing, scalability has become a standard 
requirement for the new generation of digital image coding technology. Typically, scalabilities 
in terms of pixel precision and of spatial resolution are, among others, two basic requirements 
for still image compression. 

To achieve scalability while ensuring image fidelity, recent developments in image 
compression technology have incorporated multi-resolution decompositions base^^^ 
"wavelets". Wavelets are mathematical functions, first widely considered in academic 
applications only after the Second Worid War. The name wavelet is derived firom the fact that 
the basis fimction~or the "mother wavelet'^ generally integrates to zero, thus "waving" about 
the X-axis. Other characteristics, like the fact that wavelets are orthomormal or symmetric, 
ensure quick and easy calculation of the direct and inverse wavelet transform i.e. especially 
useful in decoding. 

Another important advantage to wavelet based transforms is the fact that many classes ol 
signals or images can bVrep^^ compact way. For examp%in>^ 

with discontinuities and images with sharp spikes usually take substantially fewer wavefet basis 
functions than sine or cosihe based functions to achieve the same precision. This implies th^^^ 
wavelet-feed method has potential to gel a Wgher image compression ratios. For the same 
precision, the images that are reconstructed firom wavelet coefficients look better than the 
images obtained using a Fourier (sine or cosine) transform. This appears to indicate that the 
wavelet scheme produces images more closely sympathetic to the human visual system. 

A wavelet transforms the image into a coarse, low resolution version of the original and a 
series of enhancements that add finer and finer detail to the image. This multi-resolution 
property is well suited for networked applications where scalability and graceful degradation 



4 



wo 00/04721 



PCT/CA99/00641 



ue required. For example, a heterogeneous network may include very high bandwidth parts as 
well as 28.8 modem connections and everything in between. It would be nice to send the same 
video signal to all parts of the network, dropping finer details and sending a low resolution 
image to the parts of the network \yith low bandwidth. Wavelets are well suited to this 
application by wrapping the coarse, low resolution image in the highest priority packets which 
would reach the entire network. The enhancements belong in lower priority packets that may 
be dropped in lower bandwidth parts of the network. 

This multi-resolution property of the coded image also supports graceful degradation in a 
noisy communications channel such as a wireless network or a sick network. The high priority 
packets containing the low resolution base image would be retransmitted while the 
enhancements would be discarded if errors occur. 

Content-based coding and accessibility is a forther, new dimension within the realm of image 
compression. The ability to spedfy and manipulate specific regions of an image is not 
supported by previously disclosed coding techniques such as JPEG. Nor is content-based 
random accessibility a claimed fiinetionality within any of new wavelet based technologies. 
End user applications that require this feature include niultimedia database query, Interriet 
server-client interaction^ image content production and editing, remote medical di^ostics, 
and interactive entertainment, to name a few. 

Content-based query to multimedia databases requires the support of the mechanism that 
locates those imagery materials where an interested object is present. Content-based hyperlink 
to Internet or local disk sftes makes desired objects wthiii an image serve as entry points for 
information navigation. Content-based editing enables a content producer to manipulate the 
attributes of the image materials in an object-oriented or region-based manner. Content-based 
interaction allows a digital content subscriber or a reiiiote researcher to selectively control the 
image information transmission based on their regions of interest. In short, this content-based 
accessibility allows semantically meaningfiil visual objects to be used as the basis for image 
data representation, explanation, manipulation, and retrieval. 

Summat^ of the Invention 

It is an object of the present invention to provide region-based coding in image compression. 
In accordance with an aspect of the instant invention there is provided a re^on-based method 
for encoding and decoding digital still images to produce a scalable, content accessible 
compressed bit stream comprising the steps: decomposing and ordering the raw image data 
into a hierarchy of multi-resolution sub-images; detennining regions of interest; defining a 
re^on mask to identify regions of interest; encoding region masks for regions of interest; 



wo 00/04721 



PCT/CA99/00641 



determining region masks for subsequent levels of resolution; and scanning and progressively 
sorting the region data on the basis of the magnitude of the multi-resolution coefficients. 

In accordance with a further aspect of the instant invention there is provided an apparatus for 
the region-based encoding and decoding of digital still images that produces a scalable, 
content accessible compressed bit stream comprising: a means of decomposing and ordering 
the raw image data into a hierarchy of multi-resolution sub-images; means of determining 
regions of interest; means of defining a region mask to identify regions of interest; means of 
encoding region masks for regions of interest; means of determipiiig region masks for 
subsequent levels of resolution; and a means for scanning and progressiydy sorting the region 
data on the basis of thema^tude of the multi-resolution co^flBcients. 

.In accordance with yet a fiirtiier aspect of the instant invention there is provided a region- 
based system for encoding and decoding digital still images that product a scalable, content 
accessible compressed bit stream and comprises the steps: decompc^ng and ordering the raw 
image data into a hierarchy of multi-respiutipn sub-images; determining re^ons of interest; 
defining a region mask to identiiy regions of interest; encoding regon masks for regions of 
interest determining region masks for subsequent levels pf resolution; and scanning and 
progressively sorting the regipn data on t^e basis of the magnitude of the multi-resolution 
coefficients 

Brief Deseription of the Figures 

The present invention will be better understood when conside^^ conjunction with the 
following figures and description in which like terms are used to indicate like features. 

Figure 1 is a detailed multi-path flow representation of and 
architecture. 

■ Figure 2 is a representation of the multi-resolution decomposition hierarchy, obtained 
using a wavelet based transformation, of the image "Lena". 

Figure 3 is a schematic representation of the inventions "geometric" approach to the coding o: 
regions of interests 

Figure 4 is a graphic representation of the concept of "the lading one" as it applies to tiie 
coding of re^ons of interest. 

Figure 5 is a representation of three types of regipn formation schemes as applied to the still 



6 



wo 00/04721 



PCT/CA99/00641 



^mage "Lena". 

Figure 6 is a representation of the coding of the regions of importance using a Discrete Cosign 
Transform (DCT) as applied to the still image "Lena". 

Figure 7 is a flow diagram of the method of region hierarchy formation. 

Figure 8 is a flow diagram of the operation of algorithm A5 1 and the down sampling of region 
masks for subsequent resolution levels. 

Figure 9 is a representation of t^vo different methods of scanning the region-encoded data. 

. Figure 10 is a flow diagram of a preferred method of scanning the re^oh data using the region 
shrinking method. 

Figure 11 is a detailed flow diagram of the order in which data is packed within the 
multiplexer on the compression side of the system. 

Figure 12 is a flow diagram of the internal architecture of the multiplexer of the compression 
systenii. 

Figure 13 is a flow diagram of the internal architecture of the de-multiplexer on the 
decompression side of the system. 

Figure 1 4 is a detailed multi-path flow representation of the decompression system and 
architecture. 

Deteited[ Description of Prefer^^ 

Figure 1 presents the overall architecture of the method and system for image data 
compressfon. In thie preferred embddinient of the invention the raw image tfaia enters the 
system as a biteiap image, undergoes the systerri of the present invention and exits as a 
compressed bitstream. ■ 

The firist step in the compression encoding process is the transformation 6r decomposition of 
the raw data into a miiltiresolution decomposition hierarchy or MDH. The preferred 
embodiment of the present invention applies at discreet wavelet transform to achieve this 
decomposition The reader will appreciate that other transforms are available and can be 
equally well utilized in the present invention. Further, this resolutiourbased decomposition 
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need not necessarily be performed to accomplish the content accessible compression of raw 
image data. The present invention is based on a modular architecture capable of processing 
data in many different formats. 

After the multiresolution decomposition, the next stage of the preferred embodiment is the 
re^on formatting and coding of the MDH data. The reader will note that this step may be 
applied to raw image data, or data that has been transformed into a multi-resolution hiCTarchy 
uang a variety of tediniques. This step of the system is broken into two components, the 
formation, or detemdnation of the Regions Hierarchy and the subsequent coding of these 
region shapes. TWs data forms the Multiple Region Data Channels that enters the next stage in 
the ^stem of the present invention. 

After the data has been coded on the basis of its "regjonal" priorities, the data must once again 
be sorted to preserve scalability for the end user. The progressive sorting of the "re^pnalized" 
data is the system's unique and novel method to efiBciently and compressibly organize the data 
to preserve the fi delity of the iniage, its scalability and the content based 

After the sorting stage of the system is completed, entropy coding of the data is then 
performed. Entropy coding is a lossless method of data cpmprpssipn well known in the art; It 
is based on methods of statistical prediction and further contributes to the compact nature of 
the final data stream. 

Finally, a multiplexing or MUX module is included to manage the flow of diflFerent tj^es of 
data resulting fi^om the previous steps of the process. The multiplexer of the present inyentipn 
allows the user to set the "bit budget" of the data flowing to the deompre^sor by way pf 
progressive transmission control. The requirement for this fejrture may be imposed by the 
limited resources available for transmission of the data, or those available to the end usw for 
prpcessing. After multiplexing the resulting, compressed hitetream p^^^^ transmtted tfirqugh 
a variety of rnedia to the decodihg component ofthe inyentipn. 

Figure 2 is a gjaptuc jltostratip^ rm vmg^^^!^^> 

present inventipa As mentibnedprevipusly, there an^ severaf djfiferent methpds s\fa^ih;i^ ■ 
decompose or transfprm raw image data so that different levels of resolution rnay be 
Prganized. The reader will recall that this is to achieve the hierarchy deisired for scalable and/or 
gracefiiUy degraded transmissipn The different types of transfprms curreritiy available incliide 
wavdets, KL transforms, wavelet packi^e transfprins, lifting schema, windpy/ed Fouijer 
transforms, and discrete cpsign transforms. In the prefen-^ embodiment of the present 
invention the particular wayelet used is based on a lifting scheme. It wiU be appreciate by one 
skilled in the art hpwever that the architecture pf the present inventipn suppPrts o Aer wavelets 
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or perhaps other transforms designed for the particular purposes of an end user. 

In Figure 2 we see typical results from a multi-resolution based transformation of the data set 
Ix,y using the wavelet of the preferred embodiment. The test image "Lena" has been 
transformed into a hierarchy of data based on levels of resolution, presented in three spadal 
orientations. This is the "multi-resolution decomposition hierarchy" or MDH data set. The 
present invention performs, by way of default, either 3 or 5 different levels of decomposition. 
In Figure 2. we further see that at each level of resolution, 3 spatial orientations are 
represented by HL, HH, and LH where; HL represents a high pass scan on the horizontal 
plane with a low pass scan on the vertical, HH denotes a high pass scan on both planes and 
LH is a low pass scan on the horizontal v^th a high pass on the vertical. An LL, or low pass 
scan in both planes, would present meaningless information at any particular level of resolution 
_ but may be interpreted by the subsequ^t resolution level in the hierarchy. 

After the data has been decomposed and organized in this memner, the next step in the process 
is coding the data to allow for the content accessibility described above. Tp accomplish this 
objective, the present invention first defines a "region of interest*', secondly, formulates a 
"mask" to describe it and then encodes that information so that it becomes part of the 

compressed data stream. 

An important concept developed to perform this stage of present system is the notion of 
geometric progressive coding. When attempting to achieveregion-based coding while preserving 
scalability it is imperative to associate the order V (the magnitude of the resolution coefficients - 
the MDH data) with the multiple region data (i.e., v/ith relation R). This leads to a geometric 
approach to the coding set out m Figure 3 . In the prior art, the combinatorial approach (left), uses 
a sample value (a zero in the transform coeflBcient plane) to predict the possible occurrence of a 
group of zeros at a higher level of resolution. It is on this basis that the compactness in 
representation is achieved. At the same time, it will be appreciated that any error occurring during 
transmission at low leyeJs of iresbiution wilt have increasingly severe repercussions at each levet; 
of prediction. 

In the geometric approach (right) adopted in the present invention, representational compactness 
is achieved by usihg a geometric shape to cover a large set of samples (zeros) and then coding this 
shape: in this approach, rejgioris of interest in the MDH are represented in the form of geometric 
objects, like regions and cun/es and compact codes are then formulated to describe these 
geometric objects; The compact coding of the geometric objects makes use of the leading-one 
curve C in Figure 4: The advantages obtained by using this method of formulation and coding 
include the fine description of regions, the compact representation of these regions, and the 
robustness to the type of transmission errors described above. 
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Thus, &ven a subset of coefficient (Cij) in the MDH, the distribution of the absolute values of the 
coeflBcients, regardless of the order they are scanned, contains three parts (Figure 4). The leading- 
one curve C is composed of the first non-zero bit of the binary representation of all coefficients 
v^hen sought from the most significant bit. The refinement zone is composed of the binary bits of 
all coefficients following the leading one. The zero zone is composed of all the zeros preceding 
the leading one of all coefficients. Thus, if the number of total coefficients is n*N bits, and the 
area of the refinement zone is |x| bits^ and the area of the zero zone is jO( bits, then }x| + 10[ = (n- 
. 1)*N bits since the length of the curve C is N. 

In order to achieve lossless coding of this data the information for the curve C and for the 
refinement zone must be precisely recorded. The performance of an encoder in terms of 
^ compactness would then be determined by its abiUty to code the zero zone, or equiyalently, to 
code the curve C. In order to achieve the scalability in terms of order V, the curve C is 
expected to be non-increasing in its height. This is achieved through a progressive partial 
sorting process that is described below, 

To return to beginning of the process by which the multiple region data is created^ the 
preferred embodiment of the present invention contemplates three methods to determine a 
region of interest. In Figure 5 we see that the system supports: 

1. User-defined regions. In this scheme, the region is determined by either an interactive 
process (i.e. where the user specifies the re^pn of interest with an input device like a 
mouse), or by an another application progr^. A "mask" is then formulated based on thus 
user defined regipn. This method of region forpiulation is represented by Figure 5 a), 

2. Tiling. In a tiling scheme, standard sized blocks of pixels are allocated and form the 
regions. In JPEG for example, 8x& blocks can be ponsidered as the regions specified via 
tiling, Tilin^may^ 

very large images like those generated in computer aided design and manufapture; TF^e itifing 
method of region formulation is illustrated in Fi^re 5 b). 

3 . Automated Region Formulation, This autpniated process is represented by figure 5c), The 
task of the automated re^on hierarchy formulation is to segment the MDH da^ta or the original 
image data into a hierarchy of geometric regions. In this invention a transfpnnation-domain 
segmentation scheme is developed. In the prefeited embodiment of this process; the MDH 
data is segmented into spatially disjoint regions by measuring their absolute values or by 
measuring the "region importance" where re^pn importance is a group measure of the overall 
importance of all coefficients in a region of interest. In this invention we consider two types of 
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region importance: average importance, and wdghted importance. The average region 
importance is the mean value of the coefficient importance of all coefficients in that region, 
and the weighted region importance is the weighted average of the coefficient importance of 
all coefficients in the re^on. 

The automated region formulation of the present invention is accomplished by using one of 
two segmentation algorithms. The first of these is a full logarithmic scheme where threshold 
values T'\ 2""^. 2^ are used sequentially to order the MDH data, where it is know that the 
maximum MDH coeffident (( Cij|) < T. 

The second segmentation algorithm is based on a partial logarithmic scheme. In this scheme, 
only certain powers of 2, determined by the expert user, are used as threshold values. 

After thresholding the MDH data with either scheme, each spatial location on the MDH plane 
is marked with a unique label that relates to the corresponding threshold value. Thus, if "n** 
threshold values are used on a scheme, the entire MDH plane is marked wdth n+1 distinct 
labels. This set oflabels forms the region masks. 

In Figure 5 (c) we see the results of the automated segmentation of image Lena. The MDH 
coefficients generated during the multi-resolution decomposition stage thus fall into three 
ranges. In the preferred embodiment of the present invention the ranges are 0 - 15, 16 - 31 and 

32-64. 

Recalling that the MDH data structure contains multiple resolution levels arid multiple spatial 
orientations, the segmentation of the MDH data could conceivably be achieved by appljong a 
common mask set to all resolution levels and all orientations; applyirijg different masks to 
different orientations wWIe retaining a common mask for all resolution levels within each 
orientation; applying diflfereht niasks to di^^^^^ reisolution levels and retaining a common 
mask for all orientations at any ^veri resolution level; or applying dijferem masks^^ diflferent 
resolutions and orientations. 

In the preferred embodiment of the present invention, the first approach has been selected 
because of the self-similarity among different orientations. At any given resolution level, the 
boundary information (inforaiatibn related to the busy areas of those \\ath high contrast) is 
contained in the sets HH 1 , HL I , and LH 1 . In general, since the sets HH, HL, and LH capture 
band-pass features in different orientations, rion^ 

description of boundaries at that resolution leyel. A proper determination of a boundary *event* 
must occur when an event occurs in any one of the three orientations. The following operation 
is therefore used for the common importance test at the resolution level 1. 
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Hl=max{HHl,HLl.,LHl }. 

That is to say, that importance of a region is determined by the maximum value occurring in 
any one of the three orientations at that location. 

An alternative to this operation is HI = a * HHl, + b * HL1,+ c * LHl , where a+b+c=i. 

Other reasons for applying common masks for diflfa^ent resolutions and orientations include 
the self-simUarity at different resolution levels and the computational efiSdoicy of only one 
mask. That is computing a conunon mask is generally computationally cheaper than 
computing multiple masks. 

The task of region shape coding is to find an accurate and compact cod^ for the region masks 
produced in the region fontiatiion step. Both the compactt^ess and accuracy of the shape code 
have a direct impact on tiie; efBciency of tiie whole coding system. In the architecture of the 
present invention multiple, shape coding schemes ar^j supported but in the preferred 
embodiment the following PCT-based region channel is used. 

In this scheme, a region mask is coded by its Fourier transform characteristics. By applying a 
low-pass filtering in the fi-equency domaipi the global shape of multiple region masks can be 
encoded witii high accuracy and with a small number of DGT coefifidents. Figure 6 iUustrates 
a graphic example of the DCT-coded region masks as applied to tiie Lena image. By using the 
DGT transfomi to describe the mask, a.sujjstM^^ 

In the case of MDH data, only one DGT is us^ to generate th6 cpmmott mask at tiie highest 
resolution level. Other masl^ at jpyyer respluti^^^^ Sampling. FiguR 

7 illusti-ates tiie flow of data fi^omth^ start of t^ i^^ stage thrc^ugH the coding 

of the region based data lists. This process, called Algorithm A50, is a method of b6tfom-up 
region Werarcl^fbnnationamdiiidud^^ v 

(1) Calculate HI = max (LHl, HLl, HHl ), i.e., ; y 
Fork^ltoN^ Brt[k^ 

(2) Apply the region, formation schane to the coinmon importance mask HI to get 
partition maskMl. 

(3) Apply a low-pass filter to the DCT transformed mask Ml to get M,' 
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(4) Use Down-sample the M, ' to get masks M2, M3, , . . , at lower resolution levels 
(see algorithm A51 below). 

(5) Apply the masks {Mi\M2, ,-.,Ml} to respective coefficient layers to segment the MDH 
into regions. 

After step (3) above, the process by which the mask at the highest resolution level (Mj) is 
converted for use at lower resolution levels is performed by Algorithm A51 , illustrated in Figure 
8. 

Algorithm A51: Mask Down Sampling 

Assume theta 1 > theta 2 > theta 3. Assume re^bns in Ml are labeled by theta values. 

For(I = 2,3,...,b) 
For (all X and y of Mi) 

Mi (x, y) - max {Mi-l(2x, 2y), Mi-l(2x, 2y+l), Mi-l(2x^ 

2y+2)} 

While there are other methods by which to obtain the masks for the lower resolution levels, the 
down sampling algorithm (A5 1) given above precisely preserves the shape of regions at diflferent 
resolution levels. Further, the above algorithm is computationally efiBdent. 

Referring agzdn to Figure 1, the data has now passed through both the multiresolution 
decomposition and the region formulation and coding. At this stage the data has been reorganized 
on the basis of its graphic content but while the region segmentation process preserves the shape 
of regions at different resolution levels for all orientations, it does not preserve the value range 
of coefficients in corresponding re^ons at different levels and orientations. In other words, the 
relation R \s inherited at different resolution levels and for alt orientations, but the order V has, 
in general, not been precisely preservedv The task of progressive sforting is to re-establish the 
order V for aU region channels. 

The first step in the progressive sorting of the data is the scanning of the regions generated by 
the region formation and coding. As this data is scanned, a corresponding list of the MDH 
coefficients is created as they are encountered in the scanning process. It will be obvious to 
one skilled in the art that, depending upon the nature of the data to be scanned and converted 
into a linear list, efficiencies may be obtained by determining the optimum method of scanning 
the re^on data. 
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Generally speaking two types of scanning orders are contemplated; linear scanning and 
scanning based on a principle of "region shrinking". The preferred embodinient of the present 
invention uses a software switch to determine whidi of the two scanning strate^es to 
und^take. This switch characterizes the nature of the data and then implements the 
appropriate strategy. 

The first method of scanning the data generated in the region formation and coding is a simple 
linear analysis and listing of eadi coeflHcient. in this strategy, the coefficients are scanned 
beginning at the left most position of the top row of the region data and continuing row by row, 
down to the rightmost location of the bottom row. This strategy, as applied to a particular re^on, 
is illustrated in Figure 9(a). While the linear scanning strategy is easy to implement, ^ m?gor 
problem of this method is that it may destroy the descending or ascending order inherent in the 
data and thus jeopardize the compactness of the final, resulting bit-stream. This is true in the case 
of mountain ridge landscapes or similarly contoured shapes. For regions with fine patterns and 
mild changes in value, however, linear scanning can be comparatively eflScient. 

The second strategy for scanning the region-based coefficients is one based on the principle of 
region shrinking. This method is illustrated in Figure 9 (b) and is set out, mathematically, in 
Algorithm A62 below. 

Algorithm A62. 

Input: label L, mask [m][nl, inBuf [m][n}; 
Output: outBufp^J. 



Step 2. 



Step I. 



JQ = rmn {J: n^ ffllfl " L}; 
Jt = max {Jc mask [Il[J] = L}; 
\Vhile(JO<=Jl)da 



FortJ^JO; JK^JI; J4+){ 




Step 2. 



While ((Find 10 = left {i: mask [flpj = tnie> do 



Find II = right {I: mask [JlPj = L}); 
Append inBuf [r[[I01 to outBuf fK++l; 
Mask [J][IO}- NIL; 
IfaioIO){ 



Append inBuf [flpi} to putBuf [K-H-]; 
Mask [J][I11= NIL; 
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} 

Step 2.2. (Update JO and J 1 .) 



JO = min {J: mask [JJP] = L}; 
Jl = max {J: mask [J][I] = L}; 



} 



Figure 10 further illustrates the region shrinking process. For many cases such as mountain ridge 
landscape, this region-shrinking method of scanning can effectively and eflSdently preserve the 
magnitude order in the data. 

Whatever the scanning order is used to produce a linear list L for a region R, sorting is necessary 
in order to establish the order V. In the present invention, partial ordering up to the level of the 
leading-one curve is undertaken. Therefor, given a listZ = {CJ, C2. Cm}, i.e. the generated list 
of decomposition coeflBcients, implement the following progressive coding algorithm: 

Algorithm A620, Progressive Sorting 

Step 1. For every item Ci in L, output the n-th msb(Ci); 

Step 2. For those items with msb= 1 , output the values following the msb, and remove them firom 

L; 

Step 3. Letn = n-1 andgoto Step 1, 

This algorithm partially, not fully, sorts the list "L" up to the powers of 2. It is a progressive process to the 
extent that the ou^ut data list can be tmncated at any given point but the decoder has received the most 
valuable informatibn. Finally, it does not expand the list L; for complete, lossless sorting of L, the 
overall length of the sorted output is the same as L. 

The algorithm A620 encounters inefficiencies when many items possess significantly small values. 
In this event, a remarkable amount of bit^budget is spent on recording the O's preceding the 
leading I ofeach item's binary representation. Thefollowing algorithm improves this perfonnance 
by determining and using a threshold value "b-' to segregate these low value coefficiaits from 
those with hi^er values. 

Algorithm A62I. Bi-Partition Progressive Sorting 

Step L For a predetermined 0 <= b <= n, check for every Ci in L on whether ] Ci 1 < 2\ 
output to LI for those items with greater-than-threshold values and to L2 for 
those with smaller values; 

Step 2. For those items in LI, apply algorithm A620, starting with n; 

Step 3. For those items in L2, apply algorithm A620, starting with b. 



15 



wo 00/04721 



FCT/CA99/00641 



There are two basic requirements on the progressive sorting. (1). When the output bit-stream 
of the sorting process is decoded, it should produce the data in the descending order of V. (2) 
When the bit-stream is truncated at any point such that only partial data is reconstructed, the 
information amount in the reconstructed data should be maximized. 

Entropy Coding 

Again referring to Figure 1, it can be seen that the next stage in the system is the entropy 
coding of the data. Entropy coding is a lossless method of data compression well known in the 
art. It is based on the inherent nature of binary code and the repetition of like strings of data. It 
is based on a method of prediction. In the present invention, two different methods of entropy 
^encoding have been used because of the statistical nature of the two types of data resulting 
from the progressive sorting of the present invention. Type B data is that which forms the 
leading-one curve while Type A data is for all of the data in the refinement zone beneath the 
leading one curve. As may be seen from Figure 

Multiplexiiig 

The function fair of the multiplexing in the encoder system and the de-multiplexing in the decoder 
system provides the encoder and the decoder with an interactive means for the flexible control of 
the bit-rate and the quality of the compressed images. 

The interactivity in bit-budget control is reflepted by the fact that both the encoder and the 
decoder have the control to the bit-budget determination and allocation process. A base bitr- 
budget (BBB) is specified to and used by the multiplexer to determine the total number of bits of 
a compressed bit-streannt. In thedpmultiplexing process, a dec<)ding bi (DBB) can be used 

to furth^ selectiyely pnin^^ 

The functions of the multiplexef a^eite^ 

(1) given the base bit-budget (BBB) for encoding the entire image, detennining the bit^ 
budget for each resolution level and region charmeL^^ 

(2) interleaving the data from different channels into a single bit-stream: Following the 
truncation, the sorted, truncated, data from different regions, orientations, and 
resolution levelsare packed together to produce thefinal bit-stream. The default order 
for packing the data, illustrated in Figure 11 is: 
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a. The data at different resolution levels are packed from the lowest resolution 
to the highest resolution, i.e., in the order of Level 5 -> Level 4 -> Level 3 -> 
Level 2 -> Level 1. 

b. Within each resolution level, no preferred order is specified to thethree spatial 
orientations. By default, the data are scanned in the order of EL -> LH -> HH. 

c. Within d particular orientation at a given resolution level, regions are scanned 
from the highest region label to the lowest label. 

After a compressed bit stream has been created, the preferred embodiment of the present 
invention contemplates a decoding process that is able to recreate the image. Depending upon 
the bit budget and the steps taken during the creation of the compressed bit stream, the 
original image may be restored in complete fidelity to the raw image data or alternatively, with 
- some loss of information. 

To complement the muhiplexer dri the encoding side of the present system, a demultiplexing 
component is included on the decoding side of the present invention and is illustrated ui Figure 
13. An added feature of the preferred embodiment of the present invention is the ability ofthe 
user at the decoding end ofthe system to determine their own bit budget and to perhaps 
truncate the data at an arbitrarly determined value. This "decoding bit budget" is determined 
before the demultiplejdrig step and is iUusfrated in Fi^^ 

Figure 14 illustrates the remainder ofthe decoding side of the present system. For the most part, the 
decoding process simply follows the reverse steps that occured on the encoding side qf the system. 

The functions of the demultiplexer (Figure 1 4) are 

(1) unpacking the compressed bit-stream into separated data Hsts; and 

(2) applying the decoding bit-budget (DBB) to truncate the data lists. In order to provide the 
applications with a full spectrum of scalabilities b temis 

pixel precisiort^^^^^ 

Various alterations, modifications and adaptations can be made to the embodiments of the present 
mvention without departing froriit the sODpe of the invention, which is defined in the cliaims. 
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1 . A region-based method for encoding and decoding distal still images to produce a 
scalable, content accessible compressed bit stream comprising the steps: 

decomposing and ordering the raw image data into a hierarchy of multi-resolution sub- 
images; 

determining regions of interest; 

defining a region mask to identify re^ons of interest; 

encoding region masks for regions of interest 

determining region masks for subsequent levels of resolution; a^^ 

scanning and progressively sorting the region data on the basis of the magnitudes of the^^ 
multi-resolution coefiBcients. 

2. The method defined in claim 1, wherein the hierarchy of multi-resolution sub^images 
are composed on the basis of a waveljet transformation. 

3. The method defined in claim 1, whereiiri the hierarchy of multi-resolution sub-images 
. are composed on the basis of a Fourier-based transformation. 

4. The method defined in claim 1 , wherpin the hierarchy of multi-resolution sub-images 
are composed usmg rav^ image data 

5 . The method defined in claim 1, wherem regions of interest are def ernndned % of 
an autoniated f>r6cess. 

6. The method defined in claim 1, wherein regions of interest are determined by \ya^ 
user definition. 

7. The methbd defined in claim I, wherein region masks are encoded on the basis of a 
fourier transformation. 

8. The method defiined in claim 1, wherein region masks are encoded on the basis of a 
wavelet transformation. 
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y. The method defined in claim 1, wherein region based data is scanned in a linear manner 
to create a list of multi-resolution coefScients. 

10. The method defined in claim 1, wherein region based data is scanned usuig a region 
shrinking protocol to create a fist of multi-resolution coefiRcients. 

1 1 . The method defined in clsum 1, wherein the list of multi-resolution coefficients is 
sorted using a progressive, pardal sortmg regime. 

12. The method defined in claim I, wherein the hst of multi-resolution coefficients is 
sorted using a progressive sorting regime using data divided on the basis of an 
predeterniined partition. 

1 3 . The method defined in claim 1 , fijrther comprising the step of a software switch 
detenmning the Optimum method of entropy coding. 

14. The method defined in claim I, fiirther comprising the step of a multiplexing protocol 
that assembles the compressed data fiom dififdrent re^bn and resolution channels into 
an integrated bit-stream enabling both the encoder and the decoder to selectively and 
interaai vely control the bit budget and the quality of the compressed images. 

15. An apparatus for the region-based encoding and decoding of digital still images that 
produces a scalable, content accessible compressed bit stream comprising: 

a means of decomposing and ordering the raw image data mto a hierarchy of inulti- 
resolution sub^images; 

means of detenriining regions of interes^^ 
means of defining a region mask to identify regions of interest- 
means of encoding region masks for regions of interest; 
means of dietermining region masks for subseiquent levels of resolution; and 

a means for scanning and progressively sorting the region data on the basis of the 
magnitude of the multi-resolution coeflBcients. 
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16. The apparatus defined in clmm 15, wherein the hierarchy of multi-resolution sub- 
images are composed using a wavelet transformation. 

17. The apparatus defined in claim 1 5, wherein the hierarchy of multi-resolution sub- 
images are composed using a Fourier-based transformatioa 

1 8. The apparatus defined in claim 1 5, wherein the hierarchy of multi-resolution sub- 
images are composed using raw image data. 

19. The apparatus defined in claim 1 5, wherein regions of interest are determined by way 
of an automated process. 

. 20. The apparatus defined in claim 1 5, wherein regions of interest ar^ determined by way 
of the user. 

21 . The apparatus defined in claim 15, wherein region masks are encoded using a fourier 
transformation. 

22. The apparatus defined in clainri 1 5,, wherein region masks are encoded using a wavelet 
transformation. ; 

23. The apparatus defined in claim 1 5, wherein region based data is scanned in a linear 
manner to create a U^t pf niulti-reso^ 

24. The apparatus defined in claim 1 5, wherdn region based data is scanned using a region 
shrinking protocol to create a^^^ti^^^^ 

25. The apparatus definesd in claim 1 5, wherein the Dst of multi^ressolution coefficients is 
sorted using a pro^essive, partial sorting regime^ 

26. The apparatus definedin cl;wn 15, wherdn tfe:Bs^ $|^in^^ (^geflfid^nt? 
sorted using a progressive sorting regime using data dixaded on the basis of an 
predetermined partition. 

27. The apparatus defined in claini 1 5 ^ that uses a software swtch in deterniining the 
optimuiti means of entropy coding. 

28. The apparatus defined in claim 15, further comprisihg a niuttiplexing means that 
assembles the compressed data from different region and resolution channels into an 
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itegrated bit-Stream enabling both the encoder and the decoder to sdecttvely and 
interactively control the bit budget and the quality of the compressed images. 



A region-based system for encoding and decoding digital still images that produces a 
scalable, content accessible compressed bit stream and comprises the steps: 

decomposing and ordering the raw image data into a hierarchy of multi-resolution sub- 
images; 

determining regions of interest; 

de6ning a region mask to identify regions of interest; 

encoding region masks for regions of interest 

detemuning region masks for subsequent levels of resolution; and 

scanning and progressively sorting the region data on the basis of the magnitude of the 
multi-resolution coefiBcients. 
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Combmatorial Aoproach Geometric Approach 
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Figure 5. 
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For all indices pair (i j) compute: 

Hl[i]D] = rnax(LHl[i]0]. HLl[i][i].HHIilb']) , 

1— 
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Figure 7. 
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START ^ 



W 

Get the number, of levels 
b to do down-sampling. 



i= 1 




Q END 



For every index pair (x.y) of M| compute: 

Mi (>L y) = max {Mi-l(2x. 2y). 

Mi-l<Z\. 2y+l), 
Mi-E(2x+r. 2y), 
Mi-l(2.\+2, 2j'+2)} 



i = i+ 1 



Figure 8. 




(a) Linear Scanning 
Figured. 




(b) Region $hnnking 
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GetL: 
K = 0. 



Compute: 




JO 


= mm{J: mask{Il(Jl = L) 


Jl 


= max{J:mask(r][J] = L} 






No 


i = 


:J0 



END 



3 




While an 10 = Left{I: maskIIl{JJ=L} can be found do: 

Find II = Right{I: mask[Il[Jl=L}: 
AppendinBu£lJ][10tto outBuflK-H-): 
Erase mask{JKIOI to NIL: 
If(n<>lb)do 

{ Append inBuf[J][Hl to outBuf[K-H:j: 
Erase,mask[Jl[IlJtoNIL: 

) 



Figure 10. 
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